Week 04: Bottom-Up Parsing I & II
Predictive Parser: LL(1)
Predictive Parsing
- Concept
- Look at the next? tokens.
- No backtracking.
- Accept LL(K) grammars. (Left-to-Right, Leftmost derivation, k tokens)
- At each step, only one choice.
- To avoid ambiguousness, need to left-factor the grammar.
- Parsing Table
- Leftmode Non-terminal x Next Input Token
- Use stack to record frontier of parse tree
- Differnce from Recursive Descent
- For the leftmost non-terminal S
- Look at the next input token a
- Choose the production shown at [S,a]
- Reject on reaching error state
- Accept on (end of input && empty stack)
First Set
Def. (First Set)
$$First(X)=\{t\ |\ X\rightarrow^*t\alpha\}\cup\{\epsilon\ |\ X\rightarrow^*\epsilon\}$$
Algo
- $First(X)=\{t\}, if\ t\ is\ a\ terminal.$
- $\epsilon \in First(X), if
\begin{cases}
X\rightarrow \epsilon \\
X\rightarrow A_1A_2\dots A_n, and\ \epsilon \in First(A_i), \forall 1\leq i\leq n
\end{cases}$
- $First(\alpha) \subseteq First(X), if\ X\rightarrow A_1A_2\dots A_n\alpha, and\ \epsilon \in First(A_i), \forall 1\leq i\leq n$
Follow Set
Def. (Follow Set)
$$Follow(X)=\{t\ |\ S\rightarrow^*\beta Xt\delta\}$$
Algo
- $\$ \in Follow(S)$
- $First(\beta)-\{\epsilon\}\subseteq Follow(X), for\ each\ A\rightarrow\alpha X\beta$
- $Follow(A) \subseteq Follow(X), for\ each A\rightarrow\alpha X\beta\ where\ \epsilon\in First(\beta)$
How to Construct LL(1) Parsing Tables
- Construct a parsing table T for CFG G
- For each production $A \rightarrow \alpha$ in G do:
- For each terminal $t \in First(\alpha)$, $T[A, t] = \alpha$
- If $\epsilon \in First(\alpha),$, for each $t \in Follow(A)$, $T[A,t] = \alpha$
- If $\epsilon \in First(\alpha)$ and $ \$ \in Follow(A)$, $T[A,\$] = \alpha$
- If any entry is multiply defined, then G is not LL(1).
Bottom-Up Parsing
Shift-Reduce Parsing
- Concept
- Don't need left-factored grammars.
- Reduce the string to the start symbol. (Inverting production)
- A bottom-up parser traces a rightmost derivation in reverse. $$
\begin{align}
& int*int + int \\
& \textbf{int*T} + int \\
& \textbf{T} + int \\
& T + \textbf{T} \\
& T + \textbf{E} \\
& \textbf{E}
\end{align}
$$
- Thm.
- In some step, let string as $\alpha \beta \gamma$.
- Assume the next reduction is by $X \rightarrow \beta$. ($\alpha \beta \gamma \rightarrow \alpha X \gamma$)
- Then $\gamma$ is a string which contains only terminals.
- Split the string as $L_{Str} | R_{Str}$.
- $L_{Str}$ contains non-terminals and terminals. $R_{Str}$ contains only terminals.
- Actions
- Shift: Move | one place to the right. Shift a terminal to the left string.
- Reduce: Apply an inverse production at the right of $L_{str}$. $$
for\ A \rightarrow xy, Cbxy|ijk \Rightarrow CbA|ijk $$
- Implementation
- Use a stack to maintain $L_{Str}$.
- If is's legal to shift or reduce, there is a shift-reduce conflict.
- If is's legal to reduce by two productions, there is a reduce-reduce conflict.
Term. (Handle)
- Scenario
- Grammar
- $E \rightarrow T+E|E$
- $T \rightarrow int*T|int|(E)$
- At the steop $int|*int+int$
- If we reduce by ($T \rightarrow int$) given ($T | *int+int$), it will fail.
- Target
- Want to reduce only if the result can still be reduced to the start symbol.
- A handle is the rhs of the production that you can trace back in the parse tree.
- $\Rightarrow$ Handles are the children of some internal node of some AST.
- Assume a rightmost derivation $$S \rightarrow^* \alpha X \omega \rightarrow \alpha\beta\omega$$
- Then $\beta$ is a handle of $\alpha \beta \omega$.
- Thm.
- In shift-reduce parsing, handles appear only at the top of the stack, never inside.
- Bottom-up parsing algorithms are based on recognizing handles
Recognizing Handles
- There are no known efficient algo to recognize handles. But there are good heuristics.
- Viable Prefix: $\alpha_1\dots\alpha_n$ is a viable prefix if there is an $\omega$ s.t. $\alpha_1\dots\alpha_n\ |\ \omega$ is a state of a shift-reduce parser.
- A viable prefix is always the prefix of some handle.
- If we can maintain the stack's contents are viable prefixes all the time, no parsing error will occur.
- Thm. For any grammar, the set of viable prefixes is a regular language.
- Item: An item is a production with '.' somewhere on the rhs.
- Example. The items for the production rule $T \rightarrow (E)$ are
- $T \rightarrow .(E)$
- $T \rightarrow (.E)$
- $T \rightarrow (E.)$
- $T \rightarrow (E).$
- Example. The items for ($X \rightarrow \epsilon$) are $X \rightarrow .$
- Items are often called LR(0) items.
- Target: To describe the states of the production rule.
- If we need to use some production rule $R$ to reduce, the top of the stack will be the left string split by '.' on some item of $R$.
- Example.
- Input: $(int)$
- Grammar
- $E \rightarrow T+E\ |\ T$
- $T \rightarrow int*T\ |\ int\ |\ (E)$
- $(E$ is the left string split by '.' on the item $T \rightarrow (E.)$
Recognizing Viable Prefixes
- Add a dummy production $S' \rightarrow S$ to $G$
- The NFA states are the items of $G$
- For item $E \rightarrow \alpha .X\beta$ add transition: $(E \rightarrow \alpha . X \beta) \rightarrow^X (E \rightarrow \alpha X.\beta)$
- For item $E \rightarrow \alpha .X\beta$ and production $X \rightarrow \gamma$, add transition: $(E \rightarrow \alpha . X \beta) \rightarrow^\epsilon (X \rightarrow .\gamma)$
- Every state is an accepting state
- Start state is $S' \rightarrow .S$
Valid Items
- Concept
- Item $X \rightarrow \beta .\gamma$ is valid for a viable prefix $\alpha\beta$ if $$S' \rightarrow^* \alpha X\omega \rightarrow \alpha\beta\gamma\omega$$ by a right-most derivation.
- An item $I$ is valid for a viable prefix $\alpha(\alpha_1\dots\alpha_n)$ if the DFA accepts $\alpha_1\dots\alpha_n$ and terminates on some state $s$ containing $I$.
- The items in $s$ describe what the top of the item stack might be after reading input $\alpha_1\dots\alpha_n$.
LR(0) Parsing
- Assume
- Stack contains $\alpha_1\dots\alpha_k$
- Next input is $t$
- DFA on input $\alpha_1\dots\alpha_k$ teminates in state $s$
- Action
- Reduce by $X \rightarrow \beta$ if $s$ contains item $X \rightarrow \beta.$
- Shift if $s$ contains item $X \rightarrow \beta .t\omega$
- Conflict
- Reduce/Reduce if any state contains two reduce rule.
- Shift/Reduce if any state has a reduced item and a shift item
Simple LR Paring (SLR(1) Parsing)
- Assume
- Stack contains $\alpha_1\dots\alpha_k$
- Next input is $t$
- DFA on input $\alpha_1\dots\alpha_k$ teminates in state $s$
- Action
- Reduce by $X \rightarrow \beta$ if $s$ contains item $X \rightarrow \beta.$ and $t \in Follow(X)$
- Shift if $s$ contains item $X \rightarrow \beta .t\omega$
- If there are conflicts under these rules, the grammar is not SLR.
- We can use precedence declarations to resolve conflicts. * is grater than +.
LR(1)
- More powerful than SLR(1)
- Build by (item x lookahead).
- SLR(1) only checks whether lookahead is in follow set.